Model Selection

CLIP Visual Encoding

# CLIP Visual Encoding

Clip Vit Base Patch32 Stanford Cars

A visual classification model fine-tuned on the Stanford Cars dataset based on the CLIP Vision Transformer architecture

Image Classification

Taiyi CLIP Roberta 102M Chinese

The first open-source Chinese CLIP model, pre-trained on 123 million image-text pairs, with a text encoder based on RoBERTa-base architecture.

Transformers Chinese

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase